Use of pitch pattern improvement in the CHATR speech synthesis system
نویسندگان
چکیده
A corpus-based concatenative speech synthesis system using no signal processing can produce intelligible synthetic speech maintaining original voice characteristics, but it can sometimes be di cult to realize natural prosody. In such a concatenative system, it is very important to select appropriate waveform segments that are naturally close to the target prosody. This paper describes some approaches to unit selection for improving the prosody, especially intonation of such synthetic speech. If the unit selection measures for the fundamental frequency (F0) are insu cient, the concatenative system may produce speech having a discontinuous F0 pattern. Our proposed solution to this problem is to add extra measures for selecting units that form a smoother, more continuous F0 contour. Through subjective experiments, we con rmed that each of these measures e ectively improved intonation naturalness.
منابع مشابه
CHATR: a generic speech synthesis system
This paper describes a generic speech synthesis system called CHATR which is being developed at ATR. CHATR is designed in a modular way, module parameters and even which modules are actually used may be set and selected at run-time. Although some interdependencies exist between modules , CHATR ooers a useful research tool in which functionally equivalent modules may be easily compared. It also ...
متن کاملImproving speech synthesis of CHATR using a perceptual discontinuity function and constraints of prosodic modification
Concatenative synthesis is widely used in TTS to generate synthetic speech with high quality and relatively natural-sounding prosody. Whatever the type of synthesis unit used, (diphone, phoneme, etc.), a large speech database is usually needed to ensure the phonetic and phonemic variation of the units in a rich variety of contexts. In the CHATR synthesis system, unit selection nds the most appr...
متن کاملFactors affecting perceived quality and intelligibility in the CHATR concatenative speech synthesiser
In order to eliminate trial-and-error in the process of selecting a good speech database as a voice source for concatenative speech synthesis, and to determine the acoustic and prosodic characteristics that best predict `appeal' or perceived `quality' in the synthesised speech, we performed tests to evaluate listener preferences over a range of di erent synthesised voices. We found that variati...
متن کاملThe Function of Pitch Range Variations in Samples of Emotional Expressions in Persian
This study aims at investigating the interface between emotion and intonation patterns (more specifically, duration and pitch amplitude of speech). To this end, the acoustic properties of spectral parameters related to speech prosody are investigated. The results of acoustic and Statistical analysis show that mean level and range of FO in the contours vary strongly as a function of the degree o...
متن کاملStatistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language
Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...
متن کامل